{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9aaf39be-eca2-4f26-b469-0499a1b52648",
   "metadata": {},
   "source": [
    "# Controllable Agents for RAG\n",
    "\n",
    "Adding agentic capabilities on top of your RAG pipeline can allow you to reason over much more complex questions.\n",
    "\n",
    "But a big pain point for agents is the **lack of steerability/transparency**. An agent may tackle a user query through chain-of-thought/planning, which requires repeated calls to an LLM. During this process it can be hard to inspect what's going on, or stop/correct execution in the middle.\n",
    "\n",
    "This notebook shows you how to use our brand-new lower-level agent API, which allows controllable step-wise execution, on top of a RAG pipeline.\n",
    "\n",
    "We showcase this over Wikipedia documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87bd57f6-3804-48b4-b53a-9815a6dfc48c",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "575832d2-292b-4a31-93ba-d4d6ae731880",
   "metadata": {},
   "source": [
    "## Setup Data\n",
    "\n",
    "Here we load a simple dataset of different cities from Wikipedia."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "949ab8a9-c664-496f-ac6a-9085383f113d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import (\n",
    "    VectorStoreIndex,\n",
    "    SummaryIndex,\n",
    "    SimpleKeywordTableIndex,\n",
    "    SimpleDirectoryReader,\n",
    "    ServiceContext,\n",
    ")\n",
    "from llama_index.schema import IndexNode\n",
    "from llama_index.tools import QueryEngineTool, ToolMetadata\n",
    "from llama_index.llms import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3502864b-4880-47b2-af6a-9931e62aa268",
   "metadata": {},
   "outputs": [],
   "source": [
    "wiki_titles = [\n",
    "    \"Toronto\",\n",
    "    \"Seattle\",\n",
    "    \"Chicago\",\n",
    "    \"Boston\",\n",
    "    \"Houston\",\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3ac67685-bbb7-403c-82f9-6aad600dde4e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "import requests\n",
    "\n",
    "for title in wiki_titles:\n",
    "    response = requests.get(\n",
    "        \"https://en.wikipedia.org/w/api.php\",\n",
    "        params={\n",
    "            \"action\": \"query\",\n",
    "            \"format\": \"json\",\n",
    "            \"titles\": title,\n",
    "            \"prop\": \"extracts\",\n",
    "            # 'exintro': True,\n",
    "            \"explaintext\": True,\n",
    "        },\n",
    "    ).json()\n",
    "    page = next(iter(response[\"query\"][\"pages\"].values()))\n",
    "    wiki_text = page[\"extract\"]\n",
    "\n",
    "    data_path = Path(\"data\")\n",
    "    if not data_path.exists():\n",
    "        Path.mkdir(data_path)\n",
    "\n",
    "    with open(data_path / f\"{title}.txt\", \"w\") as fp:\n",
    "        fp.write(wiki_text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10da881c-c1b0-4613-a114-ec12b8e8449f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load all wiki documents\n",
    "city_docs = {}\n",
    "for wiki_title in wiki_titles:\n",
    "    city_docs[wiki_title] = SimpleDirectoryReader(\n",
    "        input_files=[f\"data/{wiki_title}.txt\"]\n",
    "    ).load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed5f6c4f-723a-4cea-a6ba-b2aac3df7ec3",
   "metadata": {},
   "source": [
    "Define LLM + Service Context + Callback Manager"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a98c988-ae17-40b2-b13d-b19a38df44d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = OpenAI(temperature=0, model=\"gpt-3.5-turbo\")\n",
    "service_context = ServiceContext.from_defaults(llm=llm)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4a4d0c6-f329-44a7-bbc9-6b31c1c84288",
   "metadata": {},
   "source": [
    "## Setup Agent\n",
    "\n",
    "In this section we define our tools and setup the agent."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51d603fb-dd10-4aaa-b860-9622dcf4f387",
   "metadata": {},
   "source": [
    "### Define Toolset\n",
    "\n",
    "Each tool here corresponds to a simple top-k RAG pipeline over a single document / Wikipedia page."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b38609b7-8030-4929-8a79-615fbc6d673d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.agent import OpenAIAgent\n",
    "from llama_index import load_index_from_storage, StorageContext\n",
    "from llama_index.node_parser import SentenceSplitter\n",
    "import os\n",
    "\n",
    "node_parser = SentenceSplitter()\n",
    "\n",
    "# Build agents dictionary\n",
    "query_engine_tools = []\n",
    "\n",
    "for idx, wiki_title in enumerate(wiki_titles):\n",
    "    nodes = node_parser.get_nodes_from_documents(city_docs[wiki_title])\n",
    "\n",
    "    if not os.path.exists(f\"./data/{wiki_title}\"):\n",
    "        # build vector index\n",
    "        vector_index = VectorStoreIndex(nodes, service_context=service_context)\n",
    "        vector_index.storage_context.persist(\n",
    "            persist_dir=f\"./data/{wiki_title}\"\n",
    "        )\n",
    "    else:\n",
    "        vector_index = load_index_from_storage(\n",
    "            StorageContext.from_defaults(persist_dir=f\"./data/{wiki_title}\"),\n",
    "            service_context=service_context,\n",
    "        )\n",
    "    # define query engines\n",
    "    vector_query_engine = vector_index.as_query_engine()\n",
    "\n",
    "    # define tools\n",
    "    query_engine_tools.append(\n",
    "        QueryEngineTool(\n",
    "            query_engine=vector_query_engine,\n",
    "            metadata=ToolMetadata(\n",
    "                name=f\"vector_tool_{wiki_title}\",\n",
    "                description=(\n",
    "                    \"Useful for questions related to specific aspects of\"\n",
    "                    f\" {wiki_title} (e.g. the history, arts and culture,\"\n",
    "                    \" sports, demographics, or more).\"\n",
    "                ),\n",
    "            ),\n",
    "        )\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9e93f2c-c134-40d1-9526-ea0317a9118a",
   "metadata": {},
   "source": [
    "### Setup OpenAI Agent\n",
    "\n",
    "We setup an OpenAI Agent through its components: an AgentRunner as well as an `OpenAIAgentWorker`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1452368b-9f3a-4da6-8c5e-11266250f6b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.agent import AgentRunner, OpenAIAgentWorker, OpenAIAgent\n",
    "from llama_index.agent.openai.step import OpenAIAgentWorker\n",
    "\n",
    "openai_step_engine = OpenAIAgentWorker.from_tools(\n",
    "    query_engine_tools, llm=llm, verbose=True\n",
    ")\n",
    "agent = AgentRunner(openai_step_engine)\n",
    "# # alternative\n",
    "# agent = OpenAIAgent.from_tools(query_engine_tools, llm=llm, verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72a3afb9-d8d2-476c-b2aa-9610bc9aa4a7",
   "metadata": {},
   "source": [
    "## Run Some Queries\n",
    "\n",
    "We now demonstrate the capabilities of our step-wise agent framework. \n",
    "\n",
    "We show how it can handle complex queries, both e2e as well as step by step. \n",
    "\n",
    "We can then show how we can steer the outputs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a40f4b0e-1a6d-4928-8ca3-662339adad0c",
   "metadata": {},
   "source": [
    "### Out of the box"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "df3d1698-2bfa-4746-9553-7150c3085fd2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== Calling Function ===\n",
      "Calling function: vector_tool_Houston with args: {\n",
      "  \"input\": \"demographics\"\n",
      "}\n",
      "Got output: Houston has a population of 2,304,580 according to the 2020 U.S. census. In 2019, the median household income in Houston was $52,338, and approximately 20.1% of the population lived at or below the poverty line. The city has a diverse age distribution, with a median age of 33.4 in 2019. The housing market in Houston consists of 987,158 housing units, with an estimated 42.3% of Houstonians owning their homes. The median monthly owner costs with a mortgage were $1,646, and the median gross rent from 2015 to 2019 was $1,041.\n",
      "========================\n",
      "\n",
      "=== Calling Function ===\n",
      "Calling function: vector_tool_Chicago with args: {\n",
      "  \"input\": \"demographics\"\n",
      "}\n",
      "Got output: Chicago experienced rapid population growth during its first hundred years, becoming one of the fastest-growing cities in the world. From its founding in 1833 with fewer than 200 people, the population grew to over 4,000 within seven years. By 1890, the population had surpassed 1 million, making Chicago the fifth-largest city in the world at the time. The city's population continued to grow, reaching its highest recorded population of 3.6 million in 1950. However, in the latter half of the 20th century, Chicago's population declined, dropping to under 2.7 million by 2010. The city experienced waves of immigration, with various ethnic groups, including Irish, Italians, Jews, Poles, Greeks, and African Americans from the American South, contributing to the city's diverse population. According to the most recent U.S. census estimates, the largest racial or ethnic groups in Chicago are non-Hispanic White, Black, and Hispanic. Additionally, Chicago has a significant LGBT population and became a sanctuary city in 2012.\n",
      "========================\n",
      "\n"
     ]
    }
   ],
   "source": [
    "response = agent.chat(\n",
    "    \"Tell me about the demographics of Houston, and compare that with the demographics of Chicago\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15e9a8cd-052a-4de0-9abe-e6e111e1c50e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Houston has a larger population compared to Chicago, with 2,304,580 residents in 2020, while Chicago had a population of under 2.7 million in 2010. Houston has a diverse population, with various ethnic groups contributing to its demographics. The median household income in Houston is $52,338, and approximately 20.1% of the population lives at or below the poverty line.\n",
      "\n",
      "Chicago, on the other hand, experienced rapid population growth during its early years, becoming one of the fastest-growing cities in the world. It reached its highest recorded population of 3.6 million in 1950. Chicago has a diverse population as well, with significant racial and ethnic groups including non-Hispanic White, Black, and Hispanic. The city also has a significant LGBT population and became a sanctuary city in 2012.\n",
      "\n",
      "Both cities have diverse populations and offer a range of cultural and ethnic experiences. However, Houston has a larger population and a slightly higher median household income compared to Chicago.\n"
     ]
    }
   ],
   "source": [
    "print(str(response))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44835357-500f-42b9-a9e3-cf0cd0d30384",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Task ID: d7c5b296-b841-429c-ac86-08ff37129a68\n",
      "Number of steps: 3\n"
     ]
    }
   ],
   "source": [
    "# list the task and steps for visibility\n",
    "tasks = agent.list_tasks()\n",
    "print(f\"Task ID: {tasks[-1].task.task_id}\")\n",
    "completed_steps = agent.get_completed_steps(tasks[-1].task.task_id)\n",
    "print(f\"Number of steps: {len(completed_steps)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc7289cc-962f-4237-a257-8a7c98063f5c",
   "metadata": {},
   "source": [
    "### Test Step-Wise Execution\n",
    "\n",
    "We now break this query down into steps. We first create a task object from the user query.\n",
    "\n",
    "We can then start running through steps - or even interjecting our own."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "507612e3-4993-462d-9ac4-d117a4168233",
   "metadata": {},
   "outputs": [],
   "source": [
    "# start task\n",
    "task = agent.create_task(\n",
    "    \"Tell me about the demographics of Houston, and compare that with the demographics of Chicago?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "919eb419-b314-4c1e-b205-ebbf7ab872ee",
   "metadata": {},
   "source": [
    "This returns a `Task` object, which contains the `input`, additional state in `extra_state`, and other fields.\n",
    "\n",
    "Now let's try executing a single step of this task."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5853881-240b-4c70-80b0-00fac3a5744c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== Calling Function ===\n",
      "Calling function: vector_tool_Houston with args: {\n",
      "  \"input\": \"demographics\"\n",
      "}\n",
      "Got output: Houston has a population of 2,304,580 according to the 2020 U.S. census. In 2017, the estimated population was 2,312,717, and in 2018 it was 2,325,502. The city has a diverse demographic makeup, with a significant number of undocumented immigrants residing in the Houston area, comprising nearly 9% of the city's metropolitan population in 2017. The age distribution in Houston includes a significant number of individuals under 15 and between the ages of 20 to 34. The median age of the city is 33.4. The city has a mix of homeowners and renters, with an estimated 42.3% of Houstonians owning housing units. The median household income in 2019 was $52,338, and 20.1% of Houstonians lived at or below the poverty line.\n",
      "========================\n",
      "\n"
     ]
    }
   ],
   "source": [
    "step_output = agent.run_step(task.task_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e8bfd09-deff-4e49-821c-d9baafad2156",
   "metadata": {},
   "source": [
    "When we inspect the logs and the output, we see that the first part was executed - the demographics of Houston."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fac43316-58d1-42ad-a97a-5744dfa77431",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Num completed for task 47c83928-06f5-4c54-9f37-70451d76b675: 1\n"
     ]
    }
   ],
   "source": [
    "completed_steps = agent.get_completed_steps(task.task_id)\n",
    "print(f\"Num completed for task {task.task_id}: {len(completed_steps)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68f39c96-1516-404a-9267-3911f9ab32f9",
   "metadata": {},
   "source": [
    "We can also take a look at the upcoming step.\n",
    "\n",
    "**NOTE**: Currently the input is not shown, since execution of a step purely depends on internal memory. This is something we're working on!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef61dbe8-7277-4fc8-8f60-c95cc8ce0ada",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Num upcoming steps for task 47c83928-06f5-4c54-9f37-70451d76b675: 1\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "TaskStep(task_id='47c83928-06f5-4c54-9f37-70451d76b675', step_id='43769c9c-61ed-47a2-84dd-a553ba8dcbba', input=None, step_state={}, next_steps={}, prev_steps={}, is_ready=True)"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "upcoming_steps = agent.get_upcoming_steps(task.task_id)\n",
    "print(f\"Num upcoming steps for task {task.task_id}: {len(upcoming_steps)}\")\n",
    "upcoming_steps[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0647b9d-64fb-4eae-b7a4-73e8d857a772",
   "metadata": {},
   "source": [
    "If you wanted to pause execution now, you can - you can take the intermediate results without completing the agent flow!\n",
    "\n",
    "**NOTE**: The `memory` of the agent (`agent.memory`) isn't modified until the task is complete and committed - so if you pause now, the memory won't be committed. This is good in case the execution fails.\n",
    "\n",
    "Let's run the next two steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0cb427d6-9dfc-4c01-894c-c353a7a1d47f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== Calling Function ===\n",
      "Calling function: vector_tool_Chicago with args: {\n",
      "  \"input\": \"demographics\"\n",
      "}\n",
      "Got output: Chicago experienced rapid population growth during its first hundred years, becoming one of the fastest-growing cities in the world. From its founding in 1833 with fewer than 200 people, the population grew to over 4,000 within seven years. By 1890, the population had surpassed 1 million, making Chicago the fifth-largest city in the world at the time. The city's population continued to grow, reaching its highest recorded population of 3.6 million in 1950. However, in the latter half of the 20th century, Chicago's population declined, dropping to under 2.7 million by 2010. The city experienced waves of immigration, with various ethnic groups, including Irish, Italians, Jews, Poles, Greeks, and African Americans from the American South, contributing to the city's diverse population. According to the most recent U.S. census estimates, the largest racial or ethnic groups in Chicago are non-Hispanic White, Black, and Hispanic. Additionally, Chicago has a significant LGBT population and became a sanctuary city in 2012.\n",
      "========================\n",
      "\n"
     ]
    }
   ],
   "source": [
    "step_output = agent.run_step(task.task_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ea66e441-bd96-4b17-a4b3-f7e899a97653",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n"
     ]
    }
   ],
   "source": [
    "step_output = agent.run_step(task.task_id)\n",
    "print(step_output.is_last)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c81608f1-d1cf-4f7c-af7a-9b2a6617ca0f",
   "metadata": {},
   "source": [
    "Since the steps look good, we are now ready to call `finalize_response`, get back our response.\n",
    "\n",
    "This will also commit the task execution to the `memory` object present in our `agent_runner`. We can inspect it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6f3f1032-3d5e-4968-be43-bd8347e72a9c",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = agent.finalize_response(task.task_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "802ae06c-70a6-475d-a3fb-fb4ef4cf04ca",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Houston has a population of 2,304,580 according to the 2020 U.S. census, while Chicago had a population of under 2.7 million in 2010. Both cities have diverse populations with various ethnic groups contributing to their demographics.\n",
      "\n",
      "In terms of age distribution, Houston has a significant number of individuals under 15 and between the ages of 20 to 34, with a median age of 33.4. Chicago's population has a diverse age range as well, but specific age distribution data was not provided.\n",
      "\n",
      "In terms of homeownership, Houston has an estimated 42.3% of residents owning housing units. Data on homeownership in Chicago was not provided.\n",
      "\n",
      "The median household income in Houston is $52,338, while specific income data for Chicago was not provided.\n",
      "\n",
      "Both cities have experienced waves of immigration, contributing to their diverse populations. Chicago has a significant LGBT population and became a sanctuary city in 2012, while specific information about these aspects in Houston was not provided.\n",
      "\n",
      "Overall, both Houston and Chicago have diverse populations with various ethnic groups and age distributions. Houston has a slightly smaller population but a higher homeownership rate and median household income compared to Chicago.\n"
     ]
    }
   ],
   "source": [
    "print(str(response))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cfe9efbf-6e1f-42b2-916d-d1eec566cebf",
   "metadata": {},
   "source": [
    "### Inspect Steps / Tasks\n",
    "\n",
    "We can inspect current and previous tasks and steps.\n",
    "\n",
    "This gives you greater transparency into what the agent has processed!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "717352b3-939a-4574-8a3b-c0a56b4ad447",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n"
     ]
    }
   ],
   "source": [
    "tasks = agent.list_tasks()\n",
    "print(len(tasks))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95c2f43b-55d1-4f14-a125-ddfaf8350131",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3\n"
     ]
    }
   ],
   "source": [
    "task_state = tasks[-1]\n",
    "steps = agent.get_completed_steps(task_state.task.task_id)\n",
    "print(len(steps))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_index_logan",
   "language": "python",
   "name": "llama_index_logan"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
